Load Balancing Performance of Dynamic Scheduling on NUMA Multiprocessors
نویسندگان
چکیده
Self scheduling is a method for task scheduling in parallel programs in which each processor acquires a new block of tasks for execution whenever it becomes idle To get the best performance the block size must be chosen to balance the scheduling overhead against the load imbalance To determine the best block size a better understanding of the role of load imbalance in self scheduling performance is needed In this paper we study the e ect of memory contention on task duration distributions and hence load balancing in self scheduling on a Non Uniform Memory Access NUMA machine Experimental studies on a BBN TC are used to reveal the strengths and weaknesses of analytical performance models to predict running time and optimal block size The models are shown to be very accurate for small block sizes However the models fail when the block size is large due to a previously unrecognized source of load imbalance We extend the analytical models to address this failure The implications for the construction of compilers and runtime systems are discussed
منابع مشابه
An Extended Gradient Model for NUMA Multiprocessor Systems
In this paper, we present the design and implementation of an eeective and scalable dynamic load balancing system for Non-Uniform Memory Access (NUMA) multiprocessors where load balancing is a key issue to achieve adequate eeciency. The proposed load balancing algorithm extends the well-known gradient model to enhance its applicability in a wide range of multiprocessor systems and to improve th...
متن کاملHierarchical loop scheduling for clustered NUMA machines
Loop scheduling is an important issue in the development of high performance multiprocessors. As modern multiprocessors have high and non-uniform memory access (NUMA) costs, the communication costs dominate the execution of parallel programs. Previous anity algorithms perform better than dynamic algorithms under non-clustered NUMA multiprocessors, but they suer heavy overheads when migrating ...
متن کاملLocality-Preserving Dynamic Load Balancing for Data-Parallel Applications on Distributed-Memory Multiprocessors
Load balancing and data locality are the two most important factors affecting the performance of parallel programs running on distributed-memory multiprocessors. A good balancing scheme should evenly distribute the workload among the available processors, and locate the tasks close to their data to reduce communication and idle time. In this paper, we study the load balancing problem of data-pa...
متن کاملParallel Classification for Data Mining on Shared-Memory Multiprocessors
We present parallel algorithms for building decision-tree classifiers on shared-memory multiprocessor (SMP) systems. The proposed algorithms span the gamut of data and task parallelism. The data parallelism is based on attribute scheduling among processors. This basic scheme is extended with task pipelining and dynamic load balancing to yield faster implementations. The task parallel approach u...
متن کاملMultiprogrammed Parallel Application Scheduling in NUMA Multiprocessors
The invention, acceptance, and proliferation of multiprocessors are primarily a result of the quest to increase computer system performance. The most promising features of multiprocessors are their potential to solve problems faster than previously possible and to solve larger problems than previously possible. Large-scale multiprocessors offer the additional advantage of being able to execute ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997